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Title: Immobilized proteins with specific binding capacities and their use in 
processes and products 

Background of the invention 
5 The pharmaceutical, the fine chemicals and the food industry need a number of 
compounds that have to be isolated from complex mixtures such as extracts of 
animal or plant tissue, or fermentation broth. Often these isolation processes 
determine the price of the product 

Conventional isolation processes are not very specific and during the isolation 
10 processes the compound to be isolated is diluted considerably with the consequence 
that expensive steps for removing water or other solvents have to be applied. 

For the isolation of some specific compounds affinity techniques are used. Hie 
advantage of these techniques is that the compounds bind very specifically to a 

15 certain ligand. However these ligands are quite often very expensive. 

To avoid spillage of these expensive ligands they can be linked to an insoluble 
support However, often linking the ligand is also expensive and, moreover, the 
functionality of the ligand is often affected negatively by such procedure. 
So a need exists for developing cheap processes for preparing highly effective 

20 immobilized ligands. 

Summary of the invention 

The invention provides a method for immobilizing a binding protein capable of 
binding tc a specific compound, comprising the use of recombinant DNA techniques 

25 for producing said binding protein or a functional part thereof still having said 
specific binding capability, said protein or said part thereof being linked to the 
outside of a host cell, whereby said binding protein or said part thereof is localized 
in the cell wall or at the exterior of the cell wall by allowing the host cell to produce 
and secrete a chimeric protein in which said binding protein or said functional part 

30 thereof is bound with its C-terminus to the N-terminus of an anchoring part of an 
anchoring protein capable of anchoring in the cell wall of the host cell, which 
anchoring part is derivable from the C-terminal part of said anchoring protein. 
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Preferably, the host is selected from Gram-positive bacteria and fungi, which have a 
cell wall at the outside of the host cell, in contrast to Gram-negative bacteria and 
cells of higher eukaryotes such as animal cells and plant cells, which have a 
membrane at the outside of their cells. Suitable Gram-positive bacteria comprise 
5 lactic acid bacteria and bacteria belonging to the genera Bacillus and Streptomyces. 
Suitable fungi comprise yeasts belonging to the genera Candida, Debaryonxyces, Han- 
senula, Kluyveromyces, Picliia and Saccharomyces, and moulds belonging to the 
genera Aspergillus, Pemcillium and Rhizopus. In this specification the group of fungi 
comprises the group of yeasts and the group of moulds, which are also known as 

10 lower eukaryotes. In contrast to the cells in plants and animals, the group of bacteria 
and lower eukaryotes are also indicated in this specification as microorganisms. 
The invention also provides a recombinant polynucleotide capable of being used in a 
method as described above, such polynucleotide comprising (i) a structural gene 
encoding a binding protein or a functional part thereof still having the specific 

15 binding capability, and (ii) at least part of a gene encoding an anchoring protein 
capable of anchoring in the cell wall of a Gram-positive bacterium or a fungus, said 
part of a gene encoding at least the anchoring part of said anchoring protein, which 
anchoring part is derivable from the C-terminal part of said anchoring protein. 
The anchoring protein can be selected from a-agglutinin, a-agglutinin, FLOl, the 

20 Major Cell Wall Protein of a lower eukaryote, and proteinase of lactic acid bacteria. 
Preferably, such polynucleotide further comprises a nucleotide sequence encoding a 
signal peptide ensuring secretion of the expression product of the polynucleotide, 
which signal peptide can be derived from a protein selected from the a-mating 
factor of yeast, a-agglutinin of yeast, invertase of Saecliaromyces, inulinase of 

25 KluyveromyceSy a -amylase of Bacillus, and proteinase of lactic acid bacteria. The 
polynucleotide can be operably linked to a promoter, which is preferably an 
inducible promoter. 

The invention further provides a recombinant vector comprising a polynucleotide 
according to the invention, a chimeric protein encoded by a polynucleotide 
30 according to the invention, and a host cell having a cell wall at the outside of its cell 
and containing at least one polynucleotide according to the invention. Preferably at 
least one polynucleotide is integrated in the chromosome of the host cell. Another 
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embodiment of this part of the invention is a host cell having a chimeric protein 
according to the invention immobilized in its cell wall and having the binding 
protein part of the chimeric protein localized in the cell wall or at the exterior of 
the cell wall. 

5 Another embodiment of the invention is a process for carrying out an isolation 
process by using an immobilized binding protein or functional part thereof still 
capable of binding to a specific compound, wherein a medium containing said 
specific compound is contacted with a host cell according to the invention under 
conditions whereby a complex between said specific compound and said immobilized 
10 binding protein is formed, separating said complex from the medium originally 
containing said specific compound and, optionally, releasing said specific compound 
from said binding protein or functional part thereof. 

Brief description of the figures 
15 In Figure 1 the composition of pEMBL9-derived plasmid pUR4122 is indicated, the 
preparation of which is described in Example 1. 

In Figure 2 the composition of plasmid pUR2741 is indicated, which is a derivative 
of published plasmid pUR2740, see Example 1. 

In Figure 3 the composition of pEMBL9-derived plasmid pUR2968 is indicated. Its 

20 preparation is described in Example 1. 

In Figure 4 the preparation of plasmid pUR4l74 starting from plasmids pUR2741, 
pUR2968 and pUR4122 is indicated, as well as the preparation of plasmid pUR4175 
starting from plasmids pSY16, pUR2968 and pUR4122. These preparations are 
described in Example 1. 

25 In Figure 5 the composition of plasmid pUR2743.4 is indicated. Its preparation is 
described in Example 2. It contains the 714 bp Pstl-Xhol fragment given in 
SEQ ID NO: 12, which fragment encodes an scFv-TRAS fragment of anti-traseolide® 
antibody 02/01/01. 

In Figure 6 the composition of plasmid pUR4178 is indicated. Its preparation is 
30 indicated in Example 2. It contains the above mentioned 714 bp Pstl-Xhol fragment 
given in SEQ ID NO: 12. This plasmid is suitable for the expression of a fusion 
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protein between scFv-TRAS and aAGG preceded by the invertase signal sequence 
(SUC2). 

In Figure 7 the composition of plasmid pUR4179 is indicated. Its preparation is 
indicated in Example 2. It contains the above mentioned 714 bp PstVXhol fragment 
5 given in SEQ ID NO: 12. This plasmid is suitable for the expression of a fusion 
protein between scFv-TRAS and aAGG preceded by the prepro-a-mating factor 
signal signal sequence. 

In Figure 8 a molecular design picture is given, showing the musk odour molecule 
traseolide® and a modified musk amigen, described in Example 3. 

10 In Figure 9 the composition of plasmid pUR4177 is indicated. Its construction is 
described in Example 4. Plasmid pUR4177 contains the 734 bp Eagl-X/tol DNA 
fragment given in SEQ ID NO: 13 encoding the variable regions of the heavy and 
light chain fragments from the monoclonal antibody directed against the human 
chorionic gonadotropin (an scFv-HCG fragment) and is a 2 jim-based vector 

15 suitable for production of the chimeric scFv HCG-aAGG fusion protein preceded by 
the invertase signal sequence and under the control of the GAL7 promoter. 
In Figure 10 the composition of plasmid pUR4180 is indicated. Its preparation is 
indicated in Example 4* It contains the above mentioned 734 bp Eagl-XJiol DNA 
. fragment given in SEQ ID NO: 13 and is a 2 nm-based vector suitable for 

20 production of the chimeric scFv-HCG-aAGG fusion protein preceded by the prepro- 
a-mating factor signal sequence and under the control of the GAL7 promoter. 
In Figure 11 the composition of plasmid pUR2990, a 2 jxm-based vector, is 
indicated, which is suggested in Example 5 as a starting vector for the preparation of 
plasmid pUR4196 (see Figure 12). Plasmid pUR2990 contains a DNA fragment 

25 encoding a chimeric lipase-FLOl protein that will be anchored in the cell wall of a 
lower eukaryote and can catalyze lipid hydrolysis. 

In Figure 12 the composition of plasmid pUR4196 is indicated. Its preparation is 
explained in Example 5. It contains a DNA fragment encoding a chimeric protein 
comprising the scFv-HCG followed by the C-terminal part of the FLOl-protein, and 
30 is a vector suitable for the production of a chimeric protein anchored in the cell wall 
of the host organism and can bind HCG. 
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In Figure 13 the composition of plasmid pUR2985 is indicated. Its preparation is 
described in Example 6. It contains a choB gene coding for the mature part of the 
cholesterol oxidase (EC 1.1.3.6) obtained via PCR techniques from the chromosome 
of Brevibacteriwn sterol icum. 
5 In Figure 14 the composition of plasmid pUR2987 is indicated. Its preparation from 
plasmid pUR2985 is described in Example 6. It contains a DNA sequence 
comprising the choB genfe coding for the mature part of the cholesterol oxidase 
preceded by DNA encoding the prepro-a-mating factor signal sequence and 
followed by DNA encoding the C-terminal part of a-agglutinin. 
10 In Figure 15 the composition of the published plasmid pGKV550 is indicated. It is 
described in Example 7 and contains the complete cell wall proteinase operon of 
Lactococcus lactis subsp. cremoris Wg2, including the promoter, the ribosome 
binding site and the prtP gene. 

In Figure 16 the composition of plasmid pUR2988 is indicated. Its preparation is 

15 described in Example 7. It is anticipated that this plasmid can be used for preparing 
a further plasmid pUR2989, which after introduction in a lactic acid bacterium will 
be responsible for producing a chimeric protein that will be anchored at the outer 
surface of the lactic acid bacterium and is capable of binding cholesterol. 
In Figure 17 the composition of plasmid pUR2993 is indicated. Its preparation is 

20 described in Example 8. It is anticipated that this plasmid can be used for 

transforming yeast cells that can bind a human epidermal growth factor (EGF) 
through an anchored chimeric protein containing an EGF receptor. 
In Figure 18 the composition of plasmids pUR4482 and 4483 is indicated. Their 
preparation is described in Example 9. Plasmid pUR4482 is a yeast episomal 

25 expression plasmid for expression of a fusion protein with the invertase signal 

sequence, the CH V 09 variable region, the Myc-tail, and the "X-P-X-P" Hinge region 
of a camel antibody, and the a-agglutinin cell wall anchor region. Plasmid pUR4483 
differs from pUR4482 in that it does not contain the "X-P-X-P" Hinge region. 
In Figure 19 immunofluorescent labelling (anti-Myc antibody) of SU10 cells in the 

30 exponential phase (OD 530 =0.5) expressing the genes of camel antibodies present on 
plasmids pUR4424, pUR4482 and pUR4483 is shown. 
Ph = phase contrast, Fl = fluorescence. 
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In Figure 20 immunofluore&cent labelling (anti-human IgG antibody) of SU10 cells 
in the exponential phase (OD 530 =0.5) expressing the genes of camel antibodies 
present on plasmids pUR4424, pUR4482 and pUR4483 is shown. 
Ph = phase contrast, Fl = fluorescence. 



Abbreviations used in the Figures : 



a-gal: 

AG-alpha-l/AGcl: 
AGal cds/a-AGG: 
10 Amp/amp r: 
CHv09: 
EmR: 
fl: 

FLOl/FLO (Opart): 

15 

Hinge: 
LEU2: 

UEU2d/Leu2d: 
Leu 2d cs: 
20 MycT: 
Ori MB1: 
Pgal7/pGAL7: 
Tpgk: 

ppa-MF/MFalss: 
25 repA: 

ScFv (Vh-Vl): 
ss: 

SUC2: 
30 2u/2 micron: 



gene encoding guar a-galactosidase 

gene expressing a-agglutinin from 5. cerevisiae 

coding sequence of a -agglutinin 

fi-lactamase resistance gene 

camel heavy chain variable 09 fragment 

erythromycin resistance gene 

phage f 1 replication sequence 

C-tenninal part of FLOl coding sequence of flocculation 
protein 

Camel "X-P-X-P" Hinge region, see Example 9 
LEU2 gene 
truncated LEU2 gene 
coding sequence LEU2d gene 
camel Myc-tail 

origin of replication MB1 derived from R coti plasmid 
GAL7 promoter 

terminator of the phosphoglyceratekinase gene 

prepro-part of a -mating factor (= signal sequence) 

gene encoding the repA protein required for replication (Fig* 

15/16). 

single chain antibody fragment containing V H and V L chains 
signal sequence 
invertase signal sequence 
2 jim sequence 
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Detailed description of the invention 

The present invention relates to the isolation of valuable compounds from complex 
mixtures by making use of immobilized ligands. The immobilized ligands can be 
proteins obtainable via genetic engineering and can consist of two parts, namely 
5 both an anchoring protein or functional part thereof and a binding protein or 
functional part thereof. 

The anchoring protein sticks into cell walls of microorganisms, preferably lower 
eukaryotes, e.g. yeasts and moulds. Often this type of proteins has a long C-terminal 
10 part that anchors it in the cell wall. These C-terminal parts have very special amino 
acid sequences. A typical example is anchoring via C-terminal sequences of proteins 
enriched in proline, see Kok (1990). 

The C-terminal pan of these anchoring proteins can contain a substantial number of 
potential serine and threonine glycosylation sites. O-glycosylation of these sites gives 

15 a rod-like conformation to the C-terminal part of these proteins. 

In the case of anchored manno-proteins they seem to be linked to the glucan in the 
cell wall of lower eukaiyotes, as they cannot be extracted from the cell wall with 
sodium dodecyl sulphate (SDS), but can be liberated by glucanase treatment, see 
. our co-pending patent application WO-94/01567 (UNILEVER) published 20 January 

20 1994 and Schreuder cs. (1993), both being published after the claimed priority date. 
Another mechanism to anchor proteins at the outer side of a cell is to make use of 
the property that a protein containing a glycosyl-phosphatidyl-inositol (GPI) group 
anchors via this GPI group to the cell surface, see Conzelmann c.s« (1990). 

25 The binding protein is so called, because it ligates or binds to the specific compound 
to be isolated. If the N-terminal part of the anchoring protein is sufficiently capable 
of binding to a specific compound, the anchoring protein itself can be used in a. 
process for isolating that specific compound. Suitable examples of a binding protein 
comprise an antibody, an antibody fragment, a combination of antibody fragments, a 

30 receptor protein, an inactivated enzyme still capable of binding the corresponding 
substrate, and a peptide obtained via Applied Molecular Evolution, see Lewin 
(1990), as well as a part of any of these proteinaceous substances still capable of 
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binding to the specific compound to be isolated. All these binding proteins are 
characterized by specific recognition of the compounds or group of related • 
compounds to be isolated. The binding rate and release rate, and therefore the 
binding constant between the specific compound to be isolated and the binding 
5 protein, can be regulated either by changing the composition of the liquid extract in 
which the compound is present or, preferably, by changing the binding protein by 
protein engineering. 

The gene coding for the chimeric protein comprising both the binding protein and 

10 the anchoring protein (or functional parts thereof) can be placed under control of a 
constitutive, inducible or derepressible promoter and will generally be preceded by a 
DNA fragment encoding a signal sequence ensuring efficient secretion of the 
chimeric, protein. Upon secretion the chimeric protein will be anchored in the cell 
wall of the microorganisms, thereby covering the surface of the microorganisms with 

15 the chimeric protein. These microorganisms can be obtained in normal fermentation 
processes and their isolation is a cheap process, when physical separation processes 
are used, e.g. centrifugation or membrane filtration. 
After washing, the isolated microorganisms can be added to liquid extracts 
containing the valuable specific compound or compounds. After some time the 

20 equilibrium between the bound and free specific compound(s) will be reached and 
the microorganisms to which the specific compound or. group of related compounds 
is bound can be separated from the extract by simple physical techniques. 
Alternatively, the microorganisms covered with ligands can be brought on a support 
material and subsequently this coated support material can be used in a column. 

25 The liquid extract containing the specific compound or compounds of interest can be 
added to the column and afterwards the compound(s) can be released from the 
ligand by changing the composition of the eluting liquid or the temperature or both. 
A skilled person will recognize that in addition to these two possibilities other 
modifications can be used for effecting the binding of the specific compound and the 

30 ligand, their subsequent isolation and/or the release of the specific compound(s). 
In particular the invention relates lo chimeric proteins that are bound to the cell 
wall of lower eukaryotes. Suitable lower eukaryotes comprise yeasts, e.g. Candida, 
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Debaryomyces, Hansenula, Kluweromyces, Pichia and Saccharomyces, and moulds e.g. 
Aspergillus, Penicillium and Rhizopus. For some applications prokaryotes are aiso 
applicable, especially Gram-positive bacteria, examples of which include lactic acid 
bacteria, and bacteria belonging to the genera Bacillus and Streptomyces. 

5 

For lower eukarvotes the present invention provides genes encoding chimeric 
proteins consisting of: 

a. a DNA sequence encoding a signal sequence functional in a lower eukaryotic 
host, e.g. derived from a yeast protein including the a-mating factor,invertase, 

10 a-agglutinin, inulinase or derived from a mould protein e.g. xylanase; 

b. a structural gene encoding a Oterminal pan of a cell wall protein preceded by a 
structural gene encoding a protein, that is capable of binding to the specific 
compound or group of compounds of interest, examples of which include 

- an antibody, 

15 - a single chain antibody fragment (scFv; see Bird and Webb Walker (1991), 

- a variable region of the heavy chain (V H ) or a variable region of the light chain 
(VJ of an antibody or that part of such variable region still containing one to 
three of the complementarity determining regions (CDRs), 

- an agonist-recognizing part of a receptor protein or a part thereof still capable 
20 of binding the agonist, 

- a catalyticaDy inactivated enzyme, or a fragment of such enzyme still containing 
a substrate binding site of the enzyme, 

• specific lipid binding proteins or parts of these proteins still containing the lipid 
binding site(s), see Ossendorp (1992), and 
25 - a peptide that has been obtained via Applied Molecular Evolution, see Lewin 
(1990). 

All expression products of these genes are characterized in that they consists of .a 
signal sequence and both a protein part, that is capable of binding to the 
compound(s) to be isolated, and a C-terminus of a typically cell wall bound protein, 
30 examples of the latter including a-agglutinin, see Lipke c.s. (1989), a-agglutinin, see 
Roy c.s. (1991), FLOl (see Example 5 and SEQ ID NO: 14) and the Major Cell 
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Wall Protein of lower eukaryotes, which C-terminus is capable of anchoring the 
expression product in the cell wall of the lower eukaryote host organism. 
The expression of these genes encoding chimeric proteins can be under control of a 
constitutive promoter, but an inducible promoter is preferred, suitable examples of 
5 which include the GAL7 promoter from Saccharomyces, the iniilinase promoter from 
Kluyveromyces, the methanol-oxidase promoter from Hansenula, and the xylanase 
promoter of Aspergillus. Preferably the constructs are made in such a way that the 
new genetic information is integrated in a stable way in the chromosome of the host 
cell, see e.g. WO-91/00920 (UNILEVER). 
10 The lower eukaryotes transformed with the above mentioned genes can be grown in 
normal fermentation, continous fermentation, or fed batch fermentation processes. 
The selection of a suitable process for growing the microorganism will depend on 
the construction of the gene and the promoter used, and on the desired purity of the 
cells after the physical separation procedure(s). 

15 

For bacteria the present invention deals with genes encoding chimeric proteins 
consisting of: 

a. a DNA sequence encoding a signal sequence functional in the specific bacterium, 
e.g. derived from a Bacillus a -amylase, a Bacillus subtilis subtilisin, or a 

20 Lactococcus lactis subsp. cremoris proteinase; 

b. a structural gene encoding a C-terminal part of a cell wall protein preceded by a 
structural gene encoding a protein capable of binding to the specific compound or 
group of compounds of interest, examples of which are given above for a lower 
eukaryote. 

25 All expression products of these genes are characterized in that they consist of a 
signal sequence and both a protein pan, that is capable of binding to the specific 
compound or specific group of compounds to be isolated, and a C-terminus of a 
typically cell wall-bound protein such as the proteinase of Lactococcus lactis subsp. 
cremoris strain Wg2, see Kok c.s. (1988) and Kok (1990), the C-terminus of which is 

30 capable of anchoring the expression product in the cell wall of the host bacterium. 
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The invention is illustrated with the following Examples without being limited 
thereto. First the endonuclease restriction sites mentioned in the Examples are 
given. 



10 



BstEII G GTNACC 
CCANTG G 



Clal 



EcoRI 



NotI 



15 Sad 



G AATTC 
CTTAA G 



AT CGAT 
TAGC TA 



Jfindlll A AGCTT 
TTCGA A 



GC GGCCGC tfrul 
CGCCGG CG 



GAGCT C 
C TCGAG 



Sail 



TCG CGA 
AGC GCT 

G TCGAC 
CAGCT G 



EagI 
Nhel 
PstI 
Xhol 



C GGCCG 
GCCGG C 

G CTAGC 
CGATC G 

CTGCA G 
G ACGTC 

C TCGAG 
GAGCT C 



Example 1. Construction of a gene encoding a chimeric protein that will be 

20 anchored in the cell wall of a lower eukaryote and is able to bind 

with high specificity lysozyme from a complex mixture. 
Lysozyme is an anti-microbial enzyme with a number of applications in the 
pharmaceutical and food industries. Several sources of lysozyme are known, e.g. egg 
yolk or a fermentation broth containing a microorganism producing lysozyme. 

25 Monoclonal antibodies have been raised against lysozyme, see Ward cs. (1989), and 
the mRNA's encoding the light and heavy chains of such antibodies have been 
isolated from the hybridoma cells and used as template for the synthesis of cDNA 
using reverse transcriptase. Starting from the plasmids as described by Ward cs. 
(1989), we constructed a pEMBL-derived plasmid, designated pUR4122, in which 

30 the multiple cloning site of the pEMBL-vector, ranging from the EcoRI to the 

Hindlll site, was replaced by a 231 bp DNA fragment, whose nucleotide sequence is 
given in SEQ ID NO: 1 and has an £coRI site (GAATTC) at nucleotides 1-6, a PstI 
site (CTGCAG) at nucleotides 105-110, a BstEII site (GGTCACC) at nucleotides 
122-128, a Xhol site (CTCGAG) at nucleotides 207-212, and a Hindlll site 

35 (AAGCTT) at nucleotides 226-231. 
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Construction of pUR4122 

Plasmid pEMBL9, see Denie c.s. (1983), was digested with EcoRl and Hindlll and 
the resulting large fragment was ligated with the double stranded synthetic DNA 
fragment given in SEQ ID NO: 1. For the successive ligation of DNA fragments, 
5 which finally form the coding sequence of a single chain antibody fragment for 

lysozyme, the following elements were combined in the 231 bp DNA fragment (SEQ 
ID NO: 1) inserted into the pEMBL-9 vector: the 3' part of the GAL7 promoter, 
the invertase signal sequence (SUC2), a Pstl restriction site, a BsfEll restriction site, 
a sequence encoding the (GGGGS)x3 peptide linker connecting the V H and V L frag- 

10 ments, a Sac\ restriction site, a Xlwl restriction site and a //mdlll restriction site, 
resulting in plasmid pUR4H9. To obtain the in frame fusion between V H and the 
GGGGS-linker plasmid pSWl-VHDl3-VKDU-TAGl, see Ward as, (1989), was 
digested with Pstl and BstEll and a DNA fragment of 035 kbp was ligated in the 
correspondingly digested pUR4119 resulting in plasmid pUR4119A. Subsequently 

15 the plasmid pSWl-VHD1.3-VKD13-TAGl was digested with Sad and Xliol and 
this fragment containing the coding part of V L was finally ligated into the Sacl/Xhol 
sites of pUR4119A, resulting in plasmid pUR4122 (see Figure 1). 

Construction of P UR4174. see Figure 4 

20 To obtain S. cerevisiae episomal expression plasmids containing DNA encoding a cell 
wall anchor derived from the C-terminal part of a-agglutinin, plasmid pUR2741 (see 
Figure 2) was selected as starting vector. Basically, this plasmid is a derivative of 
pUR2740, which is a derivative of plasmid pUR2730 as described in WO-91/19782 
(UNILEVER) and by Verbakel (W91). The preparation of pUR2730 is clearly 

25 described in Example 9 of EP-AI-0255153 (UNILEVER). Plasmid pUR2741 differs 
from plasmid pUR2740 in that the Eagl restriction site within the remaining part of 
the already inactive tet resistance gene was deleted through Nrul/Sall digestion. The 
Sail site was filled in prior to religation. 




30 



After digesting pUR4122 with Sad (partially) and ///ndlll, the approximately 800 bp 
fragment was isolated and cloned into the pUR2741 vector fragment, which was 
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obtained after digestion of pUR2741 with the same enzymes. The resulting plasmid 
was named pUR4 125. 

A plasmid named pUR2968 (see Figure 3) was made by (1) digesting with HindUl 
the Agal -containing plasmid pLa21 published by Lipke c.s. (1989), (2) isolating an 
5 about 6.1 kbp fragment and (3) ligating that fragment with //mdlll-treated pEMBL9, 
so that the 6.1 kbp fragment was introduced into the HindUl site present in the 
multiple cloning site of the pEMBL9 vector. 

Plasmid pUR4125 was digested with Xhol and HindUl and the about 8 kbp 

fragment was ligated with the approximately 1.4 kbp NhehHindlll fragment of 

10 pUR2968, using Xhol/Nhel adapters having the following sequence: 

Xhol Nhel 

5 1 - TC GAG ATC AAA GGC GGA TCT G -3 1 = SEQ ID NO: 2 

3<- C TAG TTT CCG OCT AGA CGATC -5 1 = SEQ ID NO: 3. 

The plasmid resulting from the ligation of the appropriate parts of plasmids 

15 pUR2968 f pUR4125 and Xhol/Nhel adapters, was designated pUR4174 and encodes 

a chimeric fusion protein at the amino terminus consisting of the invertase signal 

(pre) peptide, followed by the scFv-LYS polypeptide and, finally, the Oterminal part 

of a-agglutinin (see Figure 4). 

20 Construction of pUR4175. see Figure 4 

Upon digesting pUR4122 (see above) with Pstl and HindlH, the approximately 

700 bp fragment was isolated and ligated into a vector fragment of plasmid pSY 16, 

see Harmsen c.s. (1993), which was digested with Eagl and HindUl and using 

Eagl-Pstl adapters, having the following sequence: 

25 Eagl Pstl 

5 ■ -G GCC G CC CAG GTG CAG CTG CA -3 • = SEQ ID NO: 4 

3 • - CGG GTC CAC GTC G -5 1 = SEQ ID NO: 5 

The resulting plasmid, named pUR4132, was digested with Xhol and HindUl and 

ligated with the approximately 1.4 kbp NhehHindlll fragment of pUR2968 (see* 

30 above), using Xhol/Nhel adapters as described above, resulting in pUR4175 (see 

Figure 4). This plasmid contains a gene encoding a chimeric protein consisting of 

the a-mating factor prepro-peptide, followed by the scFv-LYS polypeptide and, 

finally, the C-terminal part of a-agglutinin. 
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Example 2. Construction or genes encoding a series of homologous chimeric 

proteins thai will be anchored in the cell wall or a lower eukaryote 
and are able to bind with high specificities the musk fragrance 
trascolide© from a complex mixture. 
5 The isolation of RNA from the hybridoma cell lines, the preparation of cDNA and 

amplification of gene fragments encoding the variable regions of antibodies by PCR 

was performed according to standard procedures known from the literature, see e.g. 

Orlandi c.s. (1989). For the PCR amplification different oligonucleotide primers 

have been used. 

10 For the heavy chain fragment: 

A: AGG TSM AR C TGC AG S AGT CWG G = SEQ ID NO: 6 

Pst\ 

in which S is C or G, M is A or C, R is A or G, and W is A or T, 
and 

15 B: TGA GG A GAC GGT GAC C GT GGT CCC TTG GCC CC 

BstEU = SEQ ID NO: 7. 

For the light chain fragment (Kappa): 

C: GAC ATT GAG CTC ACC CAG TCT CCA - SEQ ID NO: 8, 

Sacl 

20 and 

D: GTT TGA TCT CGA G CT TGG TCC C = SEQ ID NO; 9. 

Xho\ 

Construction of pUR4143 

25 To simplify future construction work an EagI restriction site was introduced in 

pUR4122 (see above), at the junction between the invertase signal sequence and the 

scFv-LYS. This was achieved by replacing the about 110 bp EcoKl-Pstl fragment 

within the synthetic fragment given in SEQ ID NO: 1 by synthetic adapters with the 

following sequence: 

30 £coRI Pstl 

M22£GGCCGTTCAGGTGCAGCTGCA = SEQ ID NO: 10 

GCCGGCAAGTCCACGTCG = SEQ ID NO: 11. 
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The resulting plasmid was designated pUR4122.1: a construction vector for single 
chain Fv assembly in frame behind an Eagl site for expression behind either the 
prepro-a-mating factor sequence or the SUC2 invertase signal sequence. 
After digesting the heavy chain PCR fragment with Pstl and BtfEH, two fragments 

5 were obtained: a Pstl fragment of about 230 bp and zPsd/BsfEll fragment of about 
110 bp. The latter fragment was cloned into vector pUR4122.1, which was digested 
with Pstl and BstElL The newly obtained plasmid (pUR4122.2) was digested with 
Sacl and Xlwl, after which the light chain PCR fragment (digested with the same 
restriction enzymes) was cloned into the vector, resulting in pUR4 1223. This 

10 plasmid was digested with ft/1, after which the above described about 230 bp Pstl 
fragment was cloned into the plasmid vector, resulting in a plasmid called pUR4143. 
Two orientations are possible, but selection can be made by restriction analysis, as 
usual. Instead of the scFv-LYS gene originally present in pUR4122, this new plasmid 
pUR4143 contains a gene encoding an scFv-TRAS fragment of anti-traseolide 

15 antibody 02/01/01 (for the nucleotide sequence of the 714 bp PsthXhol fragment 
see SEQ ID NO: 12). 

Construction of nUR4178 and PUR4179. 

After digesting pUR4143 with Eagl and with Mndlll, an about 715 bp fragment can 
20 be isolated. Subsequentely, this fragment can be cloned into the vector backbone 
fragments of pUR2741 and pUR4175, that were digested with the same restriction 
enzymes. In the case of pUR2741, this resulted in plasmid pUR2743.4 (see Figure 
5). This plasmid can subsequently be cleaved with Xhol and MndlH and ligated with 
the about 8 kbp Xhol-Hindlll fragment of pUR4174, resulting in pUR4178 (see 
25 Figure 6). 

In the situation where pUR4175 was used as a starting vector, the resulting plasmid 
was designated pUR4179 (see Figure 7). 

Both plasmids, pUR4178 and pUR4179 were introduced into S. cerevisiae. 
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Example 3. The modification of the binding parts or the chimeric protein that 
can bind traseolide® in order to improve the binding or release of 
traseolide® under certain conditions. 

Modification of binding properties of antibodies during the immune response is a 
5 well known immunological phenomenon originating from the fine tuning of 

complementarity determining sequences in the antibody's binding region to the 

antigen's molecular properties. This phenomenon can be mimicked in vitro by 

adjusting the antigen binding regions of antibody fragments based on molecular 

models of these regions in contact with the antigen. 
10 One such example consists of protein engineering the antimusk antibody M02/01/01 

to a stronger binding variant M020501i. 

First, a molecular model of M02/01/01 variable fragment (Fv) was constructed by 
homology modelling, using the coordinates of the anti-lysozyme antibody HYHEL- 
10 as a template (Brookhaven Protein Data Bank entiy: 3HFM). This model was 

15 refined using Molecular Mechanics and Molecular Dynamics methods from within 
the Biosym program DISCOVER, on a Silicon Graphics 4D240 workstation. 
Secondly, the binding site of the resulting Fv was mapped by visually docking the 
musk antigen into the CDR region, followed by a refinement using molecular 
dynamics again. Upon inspection of the resulting model for packing efficiency (van 

20 der Waals contact areas), it was concluded that substitution of ALA H96 by VAL 
would increase the (hydrophobic) contact area between the ligand and Fv, and 
consequently lead to a stronger interaction (see Figure 8). 
When this mutation is introduced into M02/01/01, the cDNA-derived scFv from 
Example 2, the result will be Fv M020501i; a variant with an increased affinity of at 

25 least a factor of 5 can be expected, and the increased affinity could be measured 
using fluorescence titration of the Fv with the musk odour molecule. 
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Example 4. 



Construction of a gene encoding a chimeric protein that will be 
anchored in the cell wall of lower eukaryote and is able to bind 



hormones such as HCG. 



Gene fragments, encoding the variable regions of the heavy and light chain 
5 fragments from the monoclonal antibody directed against the human chorionic 

gonadotropin were obtained from a hybridoma cell line in a similar way as described 
in Example 2. 

Subsequently, these HCG V M and V K gene fragments were cloned into plasmid 
pUR4l43 by replacing the corresponding Pstl-BstEll and Sacl- Xhol gene fragments, 

10 resulting in plasmid pUR4]46. 

Similar to the method described in Example 2, the 734 bp Eagl-Xhol fragment 
(nucleotide sequence given in SEQ ID NO: 13) encoding the variable regions of the 
heavy and light chain fragments from the monoclonal antibody directed against the 
human chorionic gonadotropin (an scFv-HCG fragment) was isolated from pUR4146 

15 and was introduced into the vector backbone fragment of pUR4178 (see Example 2) 
and will be introduced into the vector backbone fragment of pUR4175 (see Example 
1), both digested with the same restriction enzymes. The resulting plasmids 
pUR4177 (see Figure 9) was, and pUR4180 (see Figure 10) will be, introduced into 
S. cerevisiae strain SU10. 



Example 5- Construction of a gene encoding a chimeric scFv-FLOl protein that 
will be anchored in the cell wail of lower eukaryote and is able to 
bind hormones such as HCG. 

25 One of the genes associated with the flocculation phenotype in S. cerevisiae is the 
FLOl gene. The DNA sequence of a clone containing major parts of the FLOl gene 
has been determined, see SEQ ID NO: 14 giving 2685 bp of the FLOl gene. The 
cloned fragment appeared to be approximately 2 kb shorter than the genomic copy 
as judged from Southern and Northern hybridizations, but encloses both ends of the 

30 FLOl gene. Analysis of the DNA sequence data indicates that the putative protein 
contains at the N-terminus a hydrophobic region which confirms a signal sequence 
for secretion, a hydrophobic C-terminus that might function as a signal for the 
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attachment of a GPI-anchor and many glycosylation sites, especially in the 
C-terminus, with 46.6% serine and threonine in the arbitrarily defined C-terminus 
(aa 271-894). Hence, it is likely that the FLOl gene product is located in an ' 
orientated fashion in the yeasi cell wall and may be directly involved in the process 
5 of interaction with neighbouring cells. 

The cloned FLOl sequence might therefore be suitable for the immobilization of 
proteins or peptides on the cell surface by a different type of cell wall anchor. 
For the production of a chimeric protein comprising the scFv-HCG followed by the 
C-terminal part of the FLOl-protein, plasmid pUR2990(see Figure 11) can be used 

10 as a starting vector. The preparation of episomal plasmid pUR2990 was described in 
our co-pending patent application WO-94/0J567 (UNILEVER) published on 20 
January 1994, i.e. during the priority year. Plasmid pUR2990 comprises the chimeric 
gene .consisting of the gene encoding the Humicola lipase and a gene encoding the 
putative C-terminal cell wall anchor domain of the FLOl gene product, the chimeric 

15 gene being preceded by the invertase signal sequence (SUC2) and the GAL7 
promoter; further the plasmid comprises the yeast 2 |im sequence, the defective 
Leu2 promoter described by Eckard and Hollenberg (1983), and the Leu2 gene, see 
Roy c*s. (1991). Plasmid pUR4146, described in Example 4, can be digested with 
PstI and Xhoh and the about 0.7 kbp Pstl-Xiiol fragment containing the scFv-HCG 

20 coding sequence can be isolated. For the in frame fusion of this DNA sequence 
between the C-terminal FLOl part and the SUC2 signal sequence, the fragment can 
be directly ligated with the 9,3 kbp Eagl/Nhel (partial) backbone of plasmid f r _ 
pUR2990, resulting in plasmid pUR4196 (see Figure 12). This plasmid will comprise 
an additional triplet encoding Ala at the transition between the SUC2 signal 

25 sequence and the start of the scFv-HCG, and a E-I-K-G-G amino acid sequence in 
front of the first amino acid (Ser) of the C part of FLOl protein. 

If in the previous Examples 1-5 the level of exposed antibody fragments is too low, 
the production level can be increased by mutagenesis of the frame work regions of 
30 the antibody fragment. This can be done in a site directed way or by (targeted) 
random mutagenesis, using techniques described in the literature. 
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Example 6. Construction of a gene encoding a chimeric protein that will be 
anchored in the cell wall of a lower eukaryote and is able to bind 
cholesterol. 

In the literature two DNA sequences for cholesterol oxidase are described, the clioB 
5 gene from Brevibucterium staroUcum, see Ohla c.s. (1991) and the choA gene from 
Streptomyces sp. SA-COO, see Ishizaka c.s. (1989). For the construction of a DNA 
fusion between the clwB gene coding for cholesterol oxidase (EC 1.1.3.6) and the 
3* part of the AG-al gene, the PCR technique on chromosomal DNA can be 
applied. Chromosomal DNA can be isolated by standard techniques from 
10 Brevibacterium sterolicum, and the DNA part coding for the mature part of the 
cholesterol oxidase can be amplified through application with the following 
corresponding PCR primers choOlpcr and cho02pcr: 

choOlpcr 

15 5 1 - gcc ccc agc cgc acc ctc g-3' - SEQ ID NO: 16 

3'- CGG GGG TCG GCG TGG GAG C-5 • = SEQ ID NO: 17 

til lit iii ill iii lit i 
iii lit iii iii iii iii i 

5 ' — AGATCTGAATTCGCGGCC £CC CCC AGC CGC ACC CTC G-3 • = SEQ ID NO: 18 

EcoRl NotI 

20 Eagl 

cho02pcr 

3 ' -TAG TAG AGC AGG CTG TAG GTC CGATCG ACT TTCGAA TCTAGA-5 • = SEQ ID NO: 19 



25 



iii iii iii 
iii iii iii 



5»-ATC ATC TCG TCC GAC ATC CAG-3' = SEQ ID NO: 20 

3 • -TAG TAG AGC AGG CTG TAG GTC- 5 • = SEQ ID NO: 21 

Both primers can specifically hybridize with the target sequence, thereby amplifying 
the coding part of the gene in such a way, that the specific PCR product -after 
30 Proteinase K treatment and digestion with £coRI and HindllV can be directly 
cloned into a suitable vector, here preferably pTZ19R, see Mead cs. (1986). This 
will result in piasmid pUR29S5 (see Figure 13). 

In addition to the already mentioned restriction sites both PCR primers generate 
other restriction sites at the 5' end and the 3' end of the 1.5 kbp DNA fragment, 
35 which can be used later on to fuse the fragment in frame between either the SUC2 
signal sequence or the prepro-a-mating factor signal sequence on one side and the 
C-terminus coding part of the a-agglutinin gene on the other side. To facilitate the 
ligation behind the prepro-MF sequence a Not\ site is introduced at the 5' end of 
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PCR oligonucleotide choOlpcr. allowing for example, the exchange of the 731 bp 
Eagl/Nhel fragment containing the scFv-Lys coding sequence in pUR4 175 for the 
choB coding sequence. 

To create an enzymatically inactive fusion protein between cholesterol oxidase and 
5 a-agglutinin, the above described subcloning into pTZ19R can be used. Cholesterol 
oxidase is an FAD-dcpcndcnt enzyme for which the crystal structure of the 
Brevihacterwn sterolicum enzyme has been determined, see Vrielink cs. (1991). The 
enzyme displays homology with the typical pattern of the FAD-binding domain with 
the Gly-X-Gly-X-X-Gly sequence near the N-terminus (amino acid 18-23). Site- 
10 directed in vitro mutagenesis on the plasmid pUR2985 according to the 

manufacturers protocol (Muta-Gene kit, Bio-Rad) can be applied to inactivate the 
FAD-binding site through replacing the triplet(s) encoding the Gly residue(s) by 
triplets encoding other amino acids, .thereby presumably inactivating the enzyme. 
E.g. the following primer can be used for site-directed mutagenesis of 2 of the 
15 conserved Gly residues. 

pr 3'- CGG GAG CAG TAG CGG TCA CGT ATG CCG CCA CGG CAG CGG CGC -5' 
lit tit tit lit i i iii i t iii lit iii iii iii iii iti 
!i! mi iii Iii t i ill i ! iii iii iii iii iii iii til 
C6 5'- GCC CTC GTC ATC GGC AGT GGA TAC GGC GGT GCC GTC GCC GCG -3' 
20 Ala Gly Gly Gly Gly Ala Ala Ala 

i I 
Ala Ala 

pr - primer = SEQ ID NO: 22 

cs = coding strand = SEQ ID NO: 23 

25 

As a result of the mutagenesis with the described primer, plasmid pUR2986 will be 
obtained. From this plasmid the DNA coding for the presumably inactivated 
cholesterol oxidase can be released as a 1527 bp fragment through Notl/Nliel 
digestion, and subsequently directly used to exchange the scFv-Lys coding sequence 

30 in pUR4175, thereby generating plasmid pUR2987 (see Figure 14). To obtain a 
variant yeast secretion vector, where the secretion is directed through the SUC2 
signal sequence, for example the 1S23 bp long Sacl/Nhel segment of plasmid 
pUR2986 can be used to replace the Sacl/Nhel fragment in pUR4174. 
This inactivation of the FAD-binding site might be preferable over other mutations, 

35 since an unchanged active centre can be expected to leave the binding properties of 
cholesterol oxidase for cholesterol unaltered. Instead of the described Gly-Ala 
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exchanges at position 18 and 20 of the mature coding sequence, every other suitable 
amino acid change can also be performed. 

To inactivate the enzyme, site directed mutagenesis can be optionally immediately 
performed in the active site cavity, for example through exchange of the Glu331, a 
5 residue appropriately positioned to act as the proton acceptor, thus generating a new 
variant of an immobilized, enzymatically inactive fusion protein. 




Example 7. Construction of a gene encoding a chimeric protein that will be 
10 anchored in the cell wall of a lactic acid bacterium and is able to 

bind cholesterol 

It has been described that proteinase of Lactococcus lactis subsp. cremoris is 
anchored to the cell wall through its 127 amino acid long C-terminal, see Kok as. 
(1988) and Kok (1990). In a way similar to that described in Example 6, the 

15 cholesterol oxidase of Brevibacterium sterolicum (choB) can be immobilized on the 
surface of Lactococcus lactis. Fusions can be made can be made between the c/ioB 
structural gene and the N-terminal signal sequence and the C-terminal anchor of the 
proteinase of Lactococcus lactis. Plasmid pGKV550 (see Figure 15) contains the 
complete proteinase operon of Lactococcus lactis subsp. cremoris Wg2, including the 

20 promoter, a ribosome binding site and DNA fragments encoding the already 
mentioned signal and anchor sequences, see Kok (1990). First a DNA fragment, 
containing the main part of the signal sequence, flanked by a Clal site and an Eagl 
site can be constructed with PCR on pGKV550 as follows: 



25 Primer prtl: 

5 1 -AA GAT C TA TCG AT C TTG TTA GCC GGT ACA-3 • = SEQ ID NO: 24 
Proteinase gene (non coding strand): 

3'-TT CCC GA T AGC TA G AAC AAT CGG CCA TGT CAG-5 1 

Clal = SEQ ID NO: 25 

30 

Proteinase gene: Gin Ala Lys 

5 1 -GTC GGC GAA ATC CAA GCA AAG GCG GCT-3 ■ = SEQ ID NO: 26 

Primer prt2: - SEQ ID NO: 27 

3'-CAG CCG CTT TAG GTT CGT T GC CGG C CC CCC TTC GAA CCC-5 1 
35 Eagl Hindlll 
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After the PCR reaction as described in Example 6, the 98 bp long PCR fragment 
can be isolated and digested with Ctol and Hincllll. pGKV550 can subsequently be 
cleaved partially with Cla\ and completely with HindU\ % after which digestions the 
vector fragment, containing the promoter, the ribosome binding site, the DNA 
5 fragment encoding the N-terminal S amino acids and the cell wall binding fragment 
containing the 127 C-terminal amino acids of the proteinase gene can be isolated on 
gel. 

A copy of the cholesterol oxidase gene, suitable for fusion with the pftP anchor 
domain can be produced by a PCR reaction using plasmid pUR2985 (Example 6) as 
10 template and a combination of primer choOlpcr (see Example 6) and the following 
primer cho03pcr instead of primer cho02pcr: 

cho03pcr HindlU 

3 ' -TAG TAG AGC AGG CTG TAG GTC CGA G TT CGA A CC TAG GC-5 « = SEQ ID NO: 40 

1^ ill tit iii iti iii iii iii 

u iii iii iii iii iii iii iii 

5* -ATC ATC TCG TCC GAC ATC CAG = SEQ ID NO: 20. 

The about 133 kbp fragment generated by this reaction can be digested with NotI 
and HindUl to produce a molecule which can subsequently be ligated with the large 
Eagl/HindUl fragment from pUR2988 (see Figure 16). The resulting plasmid, 

20 pUR2989, will contain the cholesterol oxidase coding sequence inserted between the 
signal sequence and the C-terminal cell wall anchor domain of the proteinase gene. 
After introduction into Lactobacillus lactis subsp. lactis MG1363 by electroporation, 
this plasmid will express cholesterol oxidase under control of the proteinase 
promoter. The transport through the membrane will be mediated by the proteinase 

25 signal sequence and the immobilization of the cholesterol oxidase by the proteinase 
anchor. As it is unlikely that the Lactococcus will secrete FAD as well, the 
cholesterol oxidase will not be active but will be capable to bind cholesterol. 



30 Example 8. Construction of a gene encoding a chimeric protein that will be 
anchored in the cell wall of a lower eukaryote and is able to bind 
growth hormones, such as the epidermal growth factor. 

For the isolation of larger amounts of human epidermal growth factor (EGF) the 
corresponding receptor can be used in form of a fusion between the binding domain 
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and a C-terminal part of a-agglutinin as cell wall anchor. The complete cDNA 
sequence of the human epidermal growth factor is cloned and sequenced. For the 
construction of a fusion protein with EGF binding capacity the N-terminal part of 
the mature receptor until the central 23 amino acids transmenbrane region can be 
5 utilized. 

The plasmiti'pUR4175 can be used for the construction. Through digestion with 
Eagl and Nhel (partial) a 731 bp DNA fragment containing the sequence coding for 
scFv is released and can be replaced by a DNA fragment coding for the first 621 
amino acids of human epidermal growth factor receptor. Initiating from an existing 

10 human cDNA library or otherwise through production of a cDNA library by 

standard techniques from preferentially EGF receptor overexpressing cells, e.g. A431 
carcinoma cells, see Ullrich cs. (1984), further PCR can be applied for the 
generation of in frame linkage between the extracellular binding domain of the 
human growth factor receptor (amino acid 1-622) and the C-terminal part of 

15 a-agglutinin. 

PCR oligonucleotides for the in frame linkage of human epidermal growth factor 
receptor and the C-terminus of a-agglutinin. 



20 a: PCR oligonucleotides for the transition between SUC2 signal sequence and the 

N-terminus of mature EGF receptor. 

> mature EGF receptor 
pri EGF1: Ala Leu Glu Lys Lys Val = SEQ ID NO: 28 

5 1 -GGG GCG_GCC_GCG CTG GAG GAA AAG AAA GTT TGC-3 1 

OC * » ' III 111 III I'' III 

iVOtJ. iii iii iii iii iii iii iii 

3«-CGC TCA GCC CGA GAC CTC CTT TTC TTT CAA ACG 5' 
EGF rec (non-coding strand): = SEQ ID NO: 29 



30 



b: PCR oligonucleotides for the in frame transition between C terminus of the 
extracellular binding domain of EGF receptor and the C terminal part of 
a-agglutinin. 
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EGF rec (coding strand): 

Asn Gly Pro lie Pro Ser Ala Thr 

5 1 -AAT GGG CCT AAG ATC CCG TCC ATC GCC ACT-3 1 = SEQ ID NO: 30 

!!! !!! !!! !!!!!!!!! Ill = SEQ ID NO: 31 

5 3 1 -TTA CCC GGA TTC TAG GGC AGG CGA TCG GAA TTCGAA CCCC-5 1 
pr EGF2: Nhel Hindlll 

This fusion would result in an addition of 2 Ala amino acids between the signal 

sequence and the mature N-terminus of EGF receptor. 

The newly obtained 1.9 kbp PCR fragment can be digested with Notl and Nhel and 
10 directly ligated into the vector pUR4175 after digesting with the same enzymes, 
resulting in plasmid pUR2993 (see Figure 17), comprising the GAL7 promoter, the 
prepro-a-mating factor sequence, the chimeric EGF receptor binding domain gene 
/ a-agglutinin gene, the yeast 2 jim sequence, the defective LEU2 promoter and the 
LEU2 gene. This plasmid can be transformed into S. cerevisiae and the transformed 
15 cells can be cultivated in YP medium whereby expression of the chimeric protein 
can be induced by adding galactose to the medium. 



Example 9. Construction of genes encoding a chimeric protein anchored to the 
20 cell wall of yeast, comprising a binding domain of a "Camelidae" 

heavy chain antibody 
Recently it was described that camels as well as a number of related species (e.g. 
lamas) contain a considerable amount of IgG antibody molecules which are only 
composed of heavy-chain dimers, see Hamers-Casterman cs« (1993). Although these 
25 "heavy-chain" antibodies are devoid of light chains, it was demonstrated, that they 
nevertheless have an extensive antigen-binding repertoire. In order to show that the 
variable regions of this type of antibodies can be produced and will be linked to the 
exterior of the cell wall of a yeast, the following constructs were prepared. 

30 Construction of pUR2997, pUR299S and P UR2?99 

The about 2.1 kbp Eag]-Hind\U fragment of pUR4177 (Example 4, Fig 9) was 
isolated. By using PCR technology, an EcoRI restriction site was introduced 
immediately upstream of the Eagl site, whereby the C of the EcoRI site, is the same 
as the first C of the Eagl site. The thus obtained EcoRl-HindlU fragment was 
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ligated into plasmid pEMBL9, which was digested with £coRI and Hi>zdIII, which 
resulted in pUR4177.A 

The EcoRl/Nhel fragment of plasmid pUR4177.A was replaced by the EcoRl/Nhel 
fragments of three different synthetic DN A fragments (SEQ ID NO: 32, SEQ ID 
5 NO: 33, and SEQ ID NO: 34) resulting in pUR2997, pUR2998 and pUR2999, 

respectively. The about 1.5 kbp BstEll-HindUl fragments of pUR2997 and pUR2998 
were isolated. 



Construction of dUR4423 

10 The multiple cloning site of plasmid pEMBL9, see Dente c.s. (1983), (ranging from 
the EcoRI to the HindlU site) was replaced by a synthetic DNA fragment having the 
nucleotide sequence given below, see SEQ ID NO: 35 giving the coding strand and 
SEQ ID NO:. 36 giving the non-coding strand. The 5'-part of this nucleotide 
sequence comprises an Eagl site, the first 4 codons of a Camelidae V H gene 

15 fragment (nucleotides 16-27) and a Xliol site (CTCGAG) coinciding with codons 5 
and 6 (nucleotides 28-33). The 3-part comprises the last 5 codons of the Camelidae 
V H gene (nucleotides 46-60) (part of which coincides with a BsiER site), eleven 
codons of the Myc tail (nucleotides 61-93), see SEQ ID NO: 35 containing these 
eleven codons and SEQ ID NO: 37 giving the amino acid sequence, and an EcoRI 

20 site (GAATTC). The EcoRI site, originally present in pEMBL9, is not functional 
any more, because the 5'- end of the nucleotide sequence contains AAl 11 instead 
of AATTC, indicated below as (EcoRI). The resulting plasmid is called pUR4421. 
The Camelidae V H fragment starts with amino acids Q-V-K and ends with amino 
acids V-S-S. 

25 (EcoRI) Eagl Xhol BstEII 

5 1 - AATT TAG CGG CCGCCCAGGT GAAACTGCTC GAGTAAGTGA CTA AGGTCAC - 50 
3» 1 ATCGCC GGCGGGTCCA CTTTGACGAG CTCATTCACT GATTCCAGTG- 
5 Q V K 

30 -CGTCTCCTCA GAACAAAAAC TCATCTCAGA AGAGGATCTG AATTAATGAG- 100 
-GCAGAGGAGT CTTGTTTTTG AGTAGAGTCT TCTCCTAGAC TTAATTACTC- 
VSS EQK LISE EDL N** 

- SEQ ID NO: 37 

EcoRI Hin&IU 

35 - AATTC ATCAA ACGGTGATA -3' 119 = SEQ ID NO: 35 

-TTAAGTAGTT TGCCACTATT CGA -5' 123 - SEQ ID NO: 36 
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(1993), and the a-agglutinin cell wall anchor region. Plasmid pUR4483 differs from 
pUR4482 in that it contains the Myc-taii but not the "X-P-X-P" Hinge region. 
Similarly, the AtfEIl-fttodlll fragment from pUR2999 can be ligated with the about 
6.3 kbp vector fragment and the about 0.44 kbp fragment from pUR4424, resulting 

5 in pUR4497, which will differ from pUR4482 in that it contains the "X-P-X-F Hinge 
region but not the Myc-tail. 

The plasmids pUR4424, pUR4482 and pUR4483 were introduced into 
Saccharomyces cerevisiae SU10 by electroporation, and transformants were selected 
on plates lacking leucine. Transformants from SU10 with pUR4424, pUR4482 or 

10 pUR4483, respectively, were grown on YP with 5% galactose and analysed with 
immuno-fluorescence microscopy, as described in Example 1 of our co-pending 
WO-94/01567 (UNILEVER) published on 20 January 1994. This method was slightly 
modified to detect the chimeric proteins, containing both the camel antibody and 
the Myc tail, present at the cell surface. 

15 In one method a monoclonal mouse anti-Myc antibody was used as a first antibody 
to bind to the Myc part of the chimeric protein; subsequently a polyclonal anti- 
mouse Ig antiserum labeled with fluorescein isothiocyanate (= FTTC) ex Sigma, 
Product No. F-0527, was used to detect the bound mouse antibody and a positive 
signal was determined by fluorescence microscopy. 

20 In the other method a polyclonal rabbit anti-human IgG serum, which had earlier 
been proven to cross-react with the camel antibodies, was used as a first antibody to 
bind the camel antibody part of the chimeric protein; subsequently a polyclonal anti- 
rabbit Ig antiserum labeled with F1TC ex Sigma, Product No, F-0382, was used to 
detect the bound rabbit antibody and a positive signal was determined by 

25 fluorescence microscopy. 

The results in Figure 19 and Figure 20 show clearly that fluorescence can be obser- 
ved on those cells in which a fusion protein of the CH V 09 fragment with the a- 
agglutinin cell wall anchor region is produced (pUR4482 and pUR4483). No 
30 fluorescence however, was visible on the cells which produce the CHyfi9 fragment 
without this anchor (pUR4424), when viewed under the same circumstances. 
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requested that a sample of such deposit, when requested, will be submitted to an 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 



n ) 


WfcMP* tlni lever N V 


T? \ 

b ) 




V_ ) 


Li i i • KOuterudm 


E } 


cuuniki : ine wetnerianQs 


F ) 


POSTAL CODE (ZIP): NL-3013 AL 


A ) 


MAMt,. uniiever Jri-.c 


tt ) 


CTDVPT « tlni lover Unnce Rl arlf f riflrfi 


C ) 


city : Lonaon 


t, } 


PAT1WTDV • T 1 1-> * f nrl V 4 nft^nm 


F ) 


POSTAL CODE (2IP): EC4P 4BQ 


A ) 


WtWP* T A^n -f* a*»b e T PT3PMVPM 

iiArtr* . .beOu : \jerflruUS o . r nLWivtw 


B ) 


STREET: Geldersestraat 90 


C) 


CITY: Rotterdam 


2) 


COUNTRY: The Netherlands 


F) 


POSTAL CODE (ZIP): NL-3011 MP 




NAME: Pieter DE GEUS 


B) 


STREET: Boeier 24 


~ ) 


CITY: Baxendrecht 


E) 


COUNTRY: The Netherlands 


F ) 


POSTAL CODE (ZIP): NL-29yi KB 


A ) 


NAME: Franciscus Maria KLIS 


B) 


STREET: Benedenlangs 102 


C) 


CITY: Amsterdam 


E) 


COUNTRY: The Netherlands 


F) 


POSTAL CODE {ZIP): NL-1025 XL 


A) 


NAME: Holger York TOSCHKA; c/o Langnese Xglo, BR3 


B) 


STREET: Aeckern 1 


C) 


CITY: REKEN 


E ) 


COUNTRY: Germany 


F) 


POSTAL CODE (ZIP): D-48734 


A) 


NAME: Cornells Theodorus VERRIPS 


B ) 


STREET: Hagedoorn 18 


C) 


CITY: Maassluis 


E> 


COUNTRY: The Netherlands 


F) 


POSTAL CODE (ZIP): NL-3142 KB 



(ii) TITLE OF INVENTION: Immobilized proteins with specific binding 
capacities and their use in processes and products* 

(iii) NUMBER OP SEQUENCES: 40 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER : IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1*0, Version #1.25 (EPO) 
(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 231 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(vii) IMMEDIATE SOURCE: 

(B) CLONE: fragment in pUR4119 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



GAATTCGAGC TCATCACACA AACAAACAAA ACAAAATGAT GCTTTTGCAA GCCTTTCTTT 



60 



TCCTTTTGGC TGGTTTTGCA GCCAAAATAT CTGCGCAGGT GCAGCTGCAG TAATGAACCA 120 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Xhol-Nhel coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

TCGAGATCAA AGCCGGATCT G 21 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Xhol-Nhel non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CTAGCAGATC CCCCTTTGAT C 21 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Eagl-PstI coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GGCCGCCCAG GTGCAGCTGC A 21 



CGGTCACCGT CTCCTCAGGT GGAGGCGGTT CAGGCGGAGG TGGCTCTGGC ggtggcggat 



180 



cggacatcga gctcactcag accaagctcg agatcaaacg gtgataagct t 



231 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) ; 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Eagl-PstI non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCTGCACCTG GGC 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer A (heavy chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AGGTSMARCT GCAGSAGTCW GG 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE : 

(B) CLONE: PCR primer B (heavy chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

TGAGGAGACG GTGACCGTGG TCCCTTGGCC CC 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer C (light chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GACATTGAGC TCACCCAGTC TCCA 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer D (light chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:. 9: 

GTTTGATCTC GAGCTTGGTC CC 22 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker EcoRI-PstI coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

AATTCGGCCG TTCAGGTCCA GCTGCA 26 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker EcoRI-PstI non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GCTGCACCTG AACGGCCG 18 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 714 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ScFv ant itraseolide 02/01/01 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



CTGCAGGAGT 


CTGGACCTGG 


CCTGGTGAAA 


CCTTCTCAGT 


CTCTGTCCCT 


CACCTGCACT 


60 


GTCACTGGCT 


ACTCAATCAC 


CAGTGATTTT 


GCCTGGAACT 


GGATCCGGCA 


GTTTCCAGGA 


120 


AACCAACTGG 


AGTGGATGGG 


CTACATAAGC 


TACAGTGGTA 


GCACTAGCTA 


CAACCCATCT 


180 


CTCAAAAGTC 


GAATCTCTCT 


CACTCGAGAC 


ACATCCAAGA 


ACCAGTTCTT 


CCTGCAGTTG 


240 


AATTCTGTGA 


CTACTGAGGA 


CACAGCCACA 


TATTACTGTG 


CAACGTCCCT 


AACATGGTTA 


300 


CTACGTCGGA 


AACGTTCTTA 


CTGGGGCCAA 


GGGACCACGG 


TCACCGTCTC 


CTCAGGTGGA 


360 


GGCGGTTCAG 


GCGGAGGTGG 


CTCTGGCGGT 


GGCGGATCGG 


ACATCGAGCT 


CACCCAGTCT 


420 


CCATCCTCCA 


TGTCTGTATC 


TCTGGGAGAC 


ACAGTCAGCA 


TCACTTGCCA 


TGCAAGTCAG 


480 


GACATTAGCA 


GTAATATAGG 


GTGGTTGCAG 


CAGAAACCAG 


GGAAATCATT 


TAAGGGCCTG 


540 


ATCTATCATG 


GAACCAACTT 


GGAAGATGGT 


ATTCCATCAA 


GGTTCAGTGG 


CAGTGGATCT 


600 


GGAGCAGATT 


ATTCCCTCAC 


CATCAGCAGC 


CTGGAATCTG 


AAGATTTTGC 


AGACTATTAC 


660 


TGTGTACAGT 


ATGCTCAGTT 


TCCATTCACG 


7TCGGCTCGG 


GGACCAAGCT 


CGAG 


714 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 734 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ScFv anti-HCG 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



CGGCCGTTCA 


GGTGCAGCTG 


CAGGAGTCTG 


GGGGACACTT 


AGTGAAGCCT 


GGAGGGTCCC 


60 


TGAAACTCTC 


CTGTGCAGCC 


TCTGGATTCG 


CTTTCAGTAG 


CTTTGACATG 


TCTTGGATTC 


120 


GCCAGACTCC 


GGAGAAGAGG 


CTGGAGTGGG 


TCGCAAGCAT. 


TACTAATGTT 


GGTACTTACA 


180 


CCTACTATCC 


AGGCAGTGTG 


AAGGGCCGAT 


TCTCCATCTC 


CAGAGACAAT 


GCCAGGAACA 


240 


CCCTAAACCT 


GCAAATGAGC 


AGTCTGAGGT 


CTGAGGACAC 


GGCCTTGTAT 


TTCTGTGCAA 


300 


GACAGGGGAC 


TGCGGCACAA 


CCTTACTGGT 


ACTTCGATGT 


CTGGGGCCAA 


GGGACCACGG 


360 


TCACCGTCTC 


CTCAGGTGGA 


GGCGGTTCAG 


GCGGAGGTGG 


CTCTGGCGGT 


GGCGGATCGG 


420 


ACATCGAGCT 


CACCCAGTCT 


CCAAAATCCA 


TGTCCATGTC 


CGTAGGAGAG 


AGGGTCACCT 


480 


TGAGCTGCAA 


GGCCAGTGAG 


ACTGTGGATT 


CTTTTGTGTC 


CTGGTATCAA 


CAGAAACCAG 


540 


AACAGTCTCC 


TAAATTGTTG 


ATATTCGGGG 


CATCCAACCG 


GTTCAGTGGG 


GTCCCCGATC 


600 


GCTTCACTGG 


CAGTGGATCT 


GCAACAGACT 


TCACTCTGAC 


CATCAGCAGT 


GTGCAGGCTG 


660 


AGGACTTTGC 


GGATTACCAC 


TGTGGACAGA 


CTTACAATCA 


TCCGTATACG 


TTCGGAGGGG 


720 


GGACCAAGCT 


CGAG 










734 



# 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: ' 

(A) LENGTH: 2685 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Saccharomyces cerevisiae 

<vii) IMMEDIATE SOURCE: 

(B) CLONE: pYYIOS 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1, .2685 ., 

(D) OTHER INFORMATION: /products "Flocculation protein" 
/gene= "FLOl" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATG ACA ATG CCT CAT CGC TAT ATG TTT TTG GCA CTC TTT ACA CTT CTG 48 
Met Thr Met Pro Kis Arg Tyr Met Phe Leu Ala Val Phe Thr Leu Leu 
1 5 10 15 

GCA CTA ACT AGT GTG GCC TCA GGA GCC ACA GAG GCG TGC TTA CGA GCA 96 
Ala Leu Thr Ser Val Ala Ser Gly Ala Thr Glu Ala Cys Leu Pro Ala 
20 25 30 

GGC CAG AGG AAA AGT GGG ATG AAT ATA AAT TTT TAC CAG TAT TCA TTG 144 
Gly Gin Aro Lys Ser Gly Met Asn lie Asn Phe Tyr Gin Tyr Ser Leu 
35 40 '45 

AAA GAT TCC TCC ACA TAT TCG AAT GCA CCA TAT ATG GCT TAT CGA TAT 192 
Lys Asp Ser Ser Thr Tyr Ser Asn Ala Ala Tyr Met Ala Tyr Gly Tyr 
50 55 60 

GCC TCA AAA ACC AAA CTA GGT TCT GTC GGA GGA CAA ACT GAT ATC TCG 240 
Ala Ser Lys Thr Lys Leu Gly Ser Val Gly Gly Gin Thr Asp lie Ser 
65 70 75 80 

ATT GAT TAT AAT ATT CCC TGT GTT AGT TCA TCA GGC ACA TTT CCT TGT 288 
lie Asp Tyr Asn lie Pro Cys Val Ser Ser Ser Gly Thr Phe Pro Cys 
85 90 95 

CCT CAA GAA GAT TCC TAT GGA AAC TGG GGA TGC AAA GGA ATG GGT GCT 336 
Pro Gin Glu Asp ser Tyr Gly Asn Trp Gly Cys Lys Gly Met Gly Ala 
100 105 110 

TGT TCT AAT AGT CAA GGA ATT GCA TAC TGG AGT ACT GAT TTA TTT GGT 384 
Cys Ser Asn Ser Gin Gly He Ala Tyr Trp Ser Thr Asp Leu Phe Gly 
115 120 125 

TTC TAT ACT ACC CCA ACA AAC GTA ACC CTA GAA ATG ACA GGT TAT TTT 432 
Phe Tyr Thr Thr Pro Thr Asn Val Thr Leu Glu Met Thr Gly Tyr Phe 
130 135 140 

TTA CCA CCA CAG ACG GGT TCT TAC ACA TTC AAG TTT GCT ACA GTT GAC 480 
Leu Pro Pro Gin Thr Gly Ser Tyr Thr Phe Lys Phe Ala Thr Val Asp 
145 150 155 160 




GAC TCT GCA ATT CTA TCA GTA GGT GGT GCA ACC GCG TTC AAC TGT TGT 
Asp Ser Ala He Leu Ser Val Gly Gly Ala Thr Ala Phe Asn Cys Cys 
165 170 175 



528 
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GCT CAA CAG CAA CCG CCG ATC AC A TCA ACG AAC TTT ACC ATT GAC GGT 576 
Ala Gin Gin Gin Pro Pro lie Thr Ser Thr Asn Phe Thr He Asp Gly 
180 ie& 190 

ATC AAG CCA TGG GGT GGA AGT TTG CCA CCT AAT ATC GAA GGA ACC GTC 624 
He Lys Pro Trp Gly Gly Ser Leu Pro Pro Asn He Glu Gly Thr Val 
195 200 205 

TAT ATG TAC GCT GGC TAC TAT TAT CCA ATG AAG GTT GTT TAC TCG AAC 672 
Tyr Met Tyr Ala Giy Tyr Tyr Tyr Pro Met Lys Val Val Tyr Ser Asn 
210 215 220 

GCT GTT TCT TGG GGT AC A CTT CCA ATT AGT GTG ACA CTT CCA GAT GGT 720 
Ala Val Ser Trp Gly Thr Leu Pro He Ser Val Thr Leu Pro Asp Gly 
225 230 235 240 

ACC ACT GTA AGT GAT GAC TTC GAA GGG TAC GTC TAT TCC TTT GAC GAT 768 
Thr Thr Val Ser Asp Asp Phe Glu Gly Tyr Val Tyr Ser Phe Asp Asp 
245 250 255 

GAC CTA AGT CAA TCT AAC TGT ACT GTC CCT GAC CCT TCA AAT TAT GCT 816 
Asp Leu Ser Gin Ser Asn Cys Thr Val Pro Asp Pro Ser Asn Tyr Ala 
260 265 270 

GTC AGT ACC ACT ACA ACT ACA ACG GAA CCA TGG ACC GGT ACT TTC ACT 864 
Vc.1 Ser Thr Thr Thr Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr 
275 280 285 

TCT ACA TCT ACT GAA ATG ACC ACC GTC ACC GGT ACC AAC GGC GTT CCA 912 
Ser Thr Ser Thr Glu Met Thr Thr Val Thr Gly Thr Asn Gly Val Pro 
290 295 300 

ACT GAC GAA ACC GTC ATT GTC ATC AGA ACT CCA ACC AGT GAA GGT CTA 960 
Thr Asp Glu Thr Val He Val He Arg Thr Pro Thr Ser Glu Gly Leu 
305 - 310 315 320 

ATC AGC ACC ACC ACT GAA CCA TGG ACT GGC ACT TTC ACT TCG ACT TCC 1008 
He Ser Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser 
225 330 335 

ACT GAG GTT ACC ACC ATC ACT GGA ACC AAC GGT CAA CCA ACT GAC GAA 1056 
Thr Glu Val Thr Thr He Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu 
340 345 350 

ACT GTG ATT GTT ATC AGA ACT CCA ACC AGT GAA GGT CTA ATC AGC ACC 1104 
Thr Val He Val He Arg Thr Pro Thr Ser Glu Gly Leu He Ser Thr 
355 360 365 

ACC ACT GAA CCA TGG ACT GGT ACT TTC ACT TCT ACA TCT ACT GAA ATG 1152 
Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met 
370 375 380 

ACC ACC GTC ACC GGT ACT AAC GGT CAA CCA ACT GAC GAA ACC GTG ATT 1200 
Thr Thr Val Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu Thr Val He 
365 390 395 400 

GTT ATC AGA ACT CCA ACC AGT GAA GGT TTG GTT ACA ACC ACC ACT GAA 1248 
Val lie Arg Thr Pro Thr Ser Glu Gly Leu Val Thr Thr Thr Thr Glu 
405 410 415 

CCA TGG ACT GGT ACT TTT ACT TCG ACT TCC ACT GAA ATG TCT ACT GTC 1296 
Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met Ser Thr Val 
420 425 430 



ACT GGA ACC AAT GGC TTG CCA ACT GAT GAA ACT GTC ATT GTT GTC AAA 
Thr Gly Thr Asn Gly Leu Pro Thr Asp Glu Thr Val He Val Val Lys 
435 440 445 



1344 
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ACT CCA ACT ACT GCC ATC TCA TCC AGT TTG TCA TCA TCA TCT TCA GGA 1392 
Thr Pro Thr Thr Ala He Ser Ser Ser Leu Ser Ser Ser Ser Ser Gly 
450 455 460 

CAA ATC ACC AGC TCT ATC ACG TCT TCG CGT CCA ATT ATT ACC CCA TTC 1440 
Gin He Thr Ser Ser He Thr Ser Ser Arg Pro He He Thr Pro Phe 
465 470 475 480 

TAT CCT AGC AAT GGA ACT TCT GTG ATT TCT TCC TCA GTA ATT TCT TCC 1488 
Tyr Pro Ser Asn Gly Thr Ser Val He Ser Ser Ser Val He Ser Ser 
485 490 495 

TCA GTC ACT TCT TCT CTA TTC ACT TCT TCT CCA GTC ATT TCT TCC TCA 1536 
Ser Val Thr Ser Ser Leu Phe Thr Ser Ser Pro Val He Ser Ser Ser 
500 505 510 

GTC ATT TCT TCT TCT ACA ACA ACC TCC ACT TCT ATA TTT TCT GAA TCA 1584 
Val He Ser Ser Ser Thr Thr Thr Ser Thr Ser He Phe Ser Glu Ser 
515 520 525 

TCT AAA TCA TCC GTC ATT CCA ACC AGT AGT TCC ACC TCT GGT TCT TCT 1632 
Ser Lys Ser Ser Val He Pro Thr Ser Ser Ser Thr Ser Gly Ser Ser 
530 535 540 

GAG AGC GAA ACG AGT TCA GCT GGT TCT GTC TCT TCT TCC TCT TTT ATC 1680 
Glu Ser Glu Thr Ser Ser Ala Gly Ser Val Ser Ser Ser Ser Phe He 
545 550 555 560 

TCT TCT GAA TCA TCA AAA TCT CCT ACA TAT TCT TCT TCA TCA TTA CCA 1728 
Ser Ser Glu Ser Ser Lys Ser Pro Thr Tyr Ser Ser Ser Ser Leu Pro 
565 570 575 

CTT GTT ACC AGT GCG ACA ACA AGC CAG GAA ACT GCT TCT TCA TTA CCA 1776 
Leu Val Thr Ser Ala Thr Thr Ser Gin Glu Thr Ala Ser Ser Leu Pro 
580 585 590 

CCT GCT ACC ACT ACA AAA ACG AGC GAA CAA ACC ACT TTG GTT ACC GTG 1824 
Pro Ala Thr Thr Thr Lys Thr Ser Glu Gin Thr Thr Leu Val Thr Val 
595 600 60S 

ACA TCC TGC GAG TCT CAT GTG TGC ACT GAA TCC ATC TCC CCT GCG ATT 1872 
Thr Ser Cys Glu Ser His Val Cys Thr Glu Ser He Ser Pro Ala lie. 
610 615 620 

GTT TCC ACA GCT ACT GTT ACT GTT AGC GGC GTC ACA ACA GAG TAT ACC 1920 
Val Ser Thr Ala Thr Val Thr Val Ser Gly Val Thr Thr Glu Tyr Thr 
625 630 635 640 

ACA TGG TGC CCT ATT TCT ACT ACA GAG ACA ACA AAG CAA ACC AAA GGG 1968 
Thr Trp Cys Pro He Ser Thr Thr Glu Thr Thr Lys Gin Thr Lys Gly 
645 650 655 

ACA ACA GAG CAA ACC ACA GAA ACA ACA AAA CAA ACC ACG GTA GTT ACA 2016 
Thr Thr Glu Gin Thr Thr Glu Thr Thr Lys Gin Thr Thr Val Val Thr 
660 665 670 

ATT TCT TCT TGT GAA TCT GAC GTA TGC TCT AAG ACT GCT TCT CCA GCC 2064 
He Ser Ser Cys Glu Ser Asp Val Cys Ser Lys Thr Ala Ser Pro Ala 
675 680 685 

ATT GTA TCT ACA AGC ACT GCT ACT ATT AAC GGC GTT ACT ACA GAA TAC 2112 
He Val Ser Thr Ser Thr Ala Thr He Asn Gly Val Thr Thr Glu Tyr 
690 695 700 



ACA ACA TGG TGT CCT ATT TCC ACC ACA GAA TCG AGG CAA CAA ACA ACG 
Thr Thr Trp Cys Pro He Ser Thr Thr Glu Ser Arg Gin Gin Thr Thr 
705 710 715 720 
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CTA GTT ACT GTT ACT TCC TGC GAA TCT GGT GTG TGT TCC GAA ACT GCT 2208 
Leu Val Thr Val Thr Ser Cys Glu Ser Gly Val Cys Ser Glu Thr Ala 
725 730 735 

TCA CCT GCC ATT GTT TCG ACG GCC ACG GCT ACT GTG AAT GAT GTT GTT 2256 
Ser Pro Ala He Val Ser Thr Ala Thr Ala Thr Val Asn Asp Val Val 
740 745 750 

ACG GTC TAT CCT ACA TGG AGG CCA CAG ACT GCG AAT GAA GAG TCT GTC 2304 
Thr Val Tyr Pro Thr Trp Arg Pro Gin Thr Ala Asn Glu Glu Ser Val 
755 760 765 

AGC TCT AAA ATG AAC AGT GCT ACC GGT GAG ACA ACA ACC AAT ACT TTA ' 2352 
Ser Ser Lys Met Asn Ser Ala Thr Gly Glu Thr Thr Thr Asn Thr Leu 
770 775 780 

GCT GCT GAA ACG ACT ACC AAT ACT GTA GCT GCT GAG ACG ATT ACC AAT 2400 
Ala Ala Glu Thr Thr Thr Asn Thr Val Ala Ala Glu Thr He Thr Asn 
785 790 795 800 

ACT GGA GCT GCT GAG ACG AAA ACA GTA GTC ACC TCT TCG CTT TCA AGA 2448 
Thr Gly Ala Ala Glu Thr Lys Thr Val Val Thr Ser Ser Leu Ser Arg 
805 810 815 

TCT AAT CAC GCT GAA ACA CAG ACG GCT TCC GCG ACC GAT GTG ATT GGT 2496 
Ser Asn His Ala Glu Thr Gin Thr Ala Ser Ala Thr Asp Val He Gly 
820 825 830 

CAC AGC AGT AGT GTT GTT TCT GTA TCC GAA ACT GGC AAC ACC AAG AGT 2544 
His Ser Ser Ser Val Val Ser Val Ser Glu Thr Gly Asn Thr Lys Ser 
835 840 845 

CTA ACA AGT TCC GGG TTG AGT ACT ATG TCG CAA CAG CCT CGT AGC ACA 2592 
Leu Thr Ser Ser Gly Leu Ser Thr Met Ser Gin Gin Pro Arg Ser Thr 
850 855 860 

CCA GCA AGC AGC ATG GTA GGA TAT AGT ACA GCT TCT TTA GAA ATT TCA 2640 
Pro Ala Ser Ser Met Val Gly Tyr Ser Thr Ala Ser Leu Glu He Ser 
865 870 875 880 

ACG TAT GCT GGC AGT GCA ACA GCT TAG TGG CCG GTA GTG GTT TAA 2685 
Thr Tyr Ala Gly Ser Ala Thr Ala Tyr Trp Pro Val Val Val 

885 890 895 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 894 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Thr Met Pro His Arg Tyr Met Phe Leu Ala Val Phe Thr Leu Leu 
1 5 10 15 

Ala Leu Thr Ser Val Ala Ser Gly Ala Thr Glu Ala Cys Leu Pro Ala 
20 25 " 30 

Gly Gin Arg Lys Ser Gly Met Asn He Asn Phe Tyr Gin Tyr Ser Leu 

35 40 . 45 

Lys Asp Ser Ser Thr Tyr Ser Asn Ala Ala Tyr Met Ala Tyr Gly Tyr 
50 55 60 



WO 94/18330 PCT/EP94/00427 

40 



Ala Ser Lys Thr Lys Leu Gly Ser Val Gly Gly Gin Thr Asp He Ser 
65 70 75 80 

He Asp Tyr Asn He Pro Cys Val Ser Ser Ser Gly Thr Phe Pro Cys 
65 90 95 

Pro Gin Glu Asp Ser Tyr Gly Asn Trp Gly Cys Lys Gly Met Gly Ala 
100 105 HO 

Cys Ser Asn Ser Gin Gly He Ala Tyr Trp Ser Thr Asp Leu Phe Gly 
115 120 125 

Phe Tyr Thr Thr Pro Thr Asn Val Thr Leu Glu- Met Thr Gly Tyr Phe 
130 135 140 

Leu Pro Pro Gin Thr Gly Ser Tyr Thr Phe Lys Phe Ala Thr Val Asp 
145 150 155 160 

Asp Ser Ala He Leu Ser Val Gly Gly Ala Thr Ala Phe Asn Cys Cys 
165 170 175 

Ala Gin Gin Gin Pro Pro He Thr Ser Thr Asn Phe Thr He Asp Gly 
180 185 190 

He Lys Pro Trp Gly Gly Ser Leu Pro Pro Asn He Glu Gly Thr Val 
195 200 205 

Tyr Met Tyr Ala Gly Tyr Tyr Tyr Pro Met Lys Val Val Tyr Ser Asn 
210 215 220 

Ala Val Ser Trp Gly Thr Leu Pro He Ser Val Thr Leu Pro Asp Gly 
225 230 235 240 

Thr Thr Val Ser Asp Asp Phe Glu Gly Tyr Val Tyr Ser Phe Asp Asp 
245 250 255 

Asp Leu Ser Gin Ser Asn Cys Thr Val Pro Asp Pro Ser Asn Tyr Ala 
260 265 270 

Val Ser Thr Thr Thr Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr 
275 280 285 

Ser Thr Ser Thr Glu Met Thr Thr Val Thr Gly Thr Asn Gly Val Pro 
290 295 300 

Thr Asp Glu Thr Val He Val He Arg Thr Pro Thr Ser Glu Gly Leu 
305 310 315 320 

He Ser Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser 
325 330 335 

Thr Glu Val Thr Thr He Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu 
340 345 350 

Thr Val He Val He Arg Thr Pro Thr Ser Glu Gly Leu He Ser Thr 
355 360 365 

Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met 
370 375 380 

Thr Thr Val Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu Thr Val He 
385 390 395 400 

Val He Arg Thr Pro Thr Ser Glu Gly Leu Val Thr Thr Thr Thr Glu 
405 410 415 
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Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met Ser Thr Val 



Thr Pro Thr Thr Ala lie Ser Ser Ser Leu Ser Ser Ser Ser Ser Gly 
450 455 460 

Gin ire- Thr Ser Ser lie Thr Ser Ser Arg Pro lie He Thr Pro Phe 
465 470 475 480 

Tvr Pro Ser Asn Gly Thr Ser Val He Ser Ser Ser Val He Ser Ser 
* 485 490 495 

Ser Val Thr Ser Ser Leu Phe Thr Ser Ser Pro Val He Ser Ser Ser 
500 505 510 

Val He Ser Ser Ser Thr Thr Thr Ser Thr Ser He Phe Ser Glu Ser 
515 520 525 

Ser Lys Ser Ser Val He Pro Thr Ser Ser Ser Thr Ser Gly Ser Ser 
530 535 540 

Glu Ser Glu Thr Ser Ser Ala Gly Ser Val Ser Ser Ser Ser Phe He 
545 550 555 560 

Ser Ser Glu Ser Ser Lys Ser Pro Thr Tyr Ser Ser Ser Ser Leu Pro 
565 570 575 

Leu Val Thr Ser Ala Thr Thr Ser Gin Glu Thr Ala Ser Ser Leu Pro 
580 585 590 

Pro Ala Thr Thr Thr Lys Thr Ser Glu Gin Thr Thr Leu Val Thr Val 
595 600 60S 

Thr Ser Cys Glu Ser His Val Cys Thr Glu Ser He Ser Pro Ala He 
610 615 620 

Val Ser Thr Ala Thr Val Thr Val Ser Gly Val Thr Thr Glu Tyr Thr 
625 630 635 640 

Thr Trp Cys Pro He Ser Thr Thr Glu Thr Thr Lys Gin Thr Lys Gly 
645 650 655 

Thr Thr Glu Gin Thr Thr Glu Thr Thr Lys Gin Thr Thr Val Val Thr 
660 665 670 

He Ser Ser Cys Glu Ser Asp Val Cys Ser Lys Thr Ala Ser Pro Ala 
675 680 685 

He Val Ser Thr Ser Thr Ala Thr He Asn Gly Val Thr Thr Glu Tyr 
690 695 700 

Thr Thr Trp Cys Pro He Ser Thr Thr Glu Ser Arg Gin Gin Thr Thr 
705 710 715 720 

Leu Val Thr Val Thr Ser Cys Glu Ser Gly Val Cys Ser Glu Thr Ala 
725 730 735 

Ser Pro Ala He Val Ser Thr Ala Thr Ala Thr Val Asn Asp Val Val 
740 745 750 

Thr Val Tyr Pro Thr Trp Arg Pro Gin Thr Ala Asn Glu Glu Ser Val 



420 



425 



Thr Gly Thr Asn Glv Leu Pro Thr Asp Glu Thr Val He Val Val Lys 
435 440 445 



755 



760 



765 
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Ser Ser Lys Met Asn Ser Ala Thr Gly Glu Thr Thr Thr Asn Thr Leu 
770 775 780 

Ala Ala Glu Thr Thr Thr Asn Thr Val Ala Ala Glu Thr lie Thr Asn 
785 790 795 800 

Thr Gly Ala Ala Glu Thr Lys Thr Val Val Thr Ser Ser Leu Ser Arg 
805 810 815 

Ser Asn His Ala Glu Thr Gin Thr Ala Ser Ala Thr Asp Val lie Gly 
820 825 830 

His Ser Ser Ser Val Val Ser Val Ser Glu Thr Gly Asn Thr Lys Ser 
835 840 845 

Leu Thr Ser Ser Gly Leu Ser Thr Met Ser Gin Gin Pro Arg Ser Thr 
850 " 855 860 

Pro Ala Ser Ser Met Val Gly Tyr Ser Thr Ala Ser Leu Glu He Ser 
865 870 875 880 

Thr Tyr Ala Gly Ser Ala Thr Ala Tyr Trp Pro Val Val Val 
885 890 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GCCCCCAGCC GCACCCTCG 19 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CGAGGGTGCG GCTGGGGGC 19 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: choOlpcr primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

AGATCTGAAT TCGCGGCCGC CCCCAGCCGC ACCCTCG 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single" 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: cho02pcr primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AGATCTAAGC TTTCAGCTAG CCTGGATGTC GGACGAGATG AT 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

ATCATCTCGT CCGACATCCA G 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CTGGATGTCG GACGAGATGA T 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: mutagenesis primer ChoB 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

CGCGGCGACG GCACCGCCGT ATGCACTGGC GATGACGAGG GC 42 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

GCCCTCGTCA TCGGCAGTGG ATACGGCGGT GCCGTCGCCG CG 42 

(2) INFORMATION FOR SEQ ID NO: 24 i 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: primer prtl 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

AAGATCTATC GATCTTGTTA GCCGGTACA 29 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: proteinase template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GACTGTACCG GCTAACAAGA TCGATAGCCC TT 32 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 
(E) TYPE: nucleic acid 
(C) STRANDEDNESS : single 

{ D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

( B ) CLONE: proteinase template coding strand 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GTCGGCGAAA TCCAAGCAAA GGCGGCT 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: prt2 primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CCCAAGCTTC CCCCCGGCCG TTGCTTGGAT TTCGCCGAC 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGF1 primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

GGGGCGGCCG CGCTGGAGGA AAAGAAAGTT TGC 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGF receptor template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

GCAAACTTTC TTTTCCTCCA GAGCCCGACT CGC 
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(2) INFORMATION FOR SEQ ID NO: 30 5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear" 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGF receptor template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

AATGGGCCTA AGATCCCG7C CATCGCCACT 30 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4C base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGF2 primer 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

CCCCAAGCTT AAGGCTAGCG GACGGGATCT TAGGCCCATT 40 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 177 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGcrl linker with MycT and Hinge 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GAATTCCAGG TCACCGTCTC CTCAGAACAA AAACTCATCT CAGAAGAGGA TCTGAATGAA 60 

CCAAAGATTC CACAACCTCA ACCAAAGCCA CAACCTCAAC CACAACCACA ACCAAAACCT 120 

CAACCAAAGC CAGAACCAGA ATCTACTTCC CCAAAGTCTC CAGCTAGCCT TAAGCTT 177 

(2) INFORMATION FOR SEQ ID NO: 33; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGal linker with MycT 
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- (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GAATTCCAGG TCACCGTCTC CTCAGAACAA AAACTCATCT CAGAAGAGGA TCTGAATGCT 



60 



AGC 



63 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 base pairs 
\b) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGal linker with Hinge 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GAATTCCAGG TCACCGTCTC CTCAGAACCA AAGATTCCAC AACCTCAACC AAAGCCACAA 60 
CCTCAACCAC AACCACAACC AAAACCTCAA CCAAAGCCAG AACCAGAATC TACTTCCCCA 120 
AAGTCTCCAG CTAG CCTT AA GCTT 144 
(2) INFORMATION FOR SEQ ID NO: 35$ 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: fragment in pUR4421 coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

AATTTAGCGG CCGCCCAGGT GAAACTGCTC GAGTAAGTGA CTAAGGTCAC CGTCTCCTCA 60 

GAACAAAAAC TCATCTCAGA AGAGGATCTG AATTAATGAG AATTCATCAA ACGGTGATA 119 

(2) INFORMATION FOR SEQ ID NO J 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: fragment in pUR4421 non-coding strand 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

AGCTTATCAC CGTTTGATGA ATTCTCATTA ATTCAGATCC TCTTCTGAGA TGAGTTTTTG 60 

TTCTGAGGAG ACGGTGACCT TAGTCACTTA CTCGAGCAGT TTCACCTGGG CGGCCGCTA 119 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: Myc tail 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Glu Gin Lys Leu lie Ser Glu Glu Asp Leu Asn 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: BstEII-Hindlll linker coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

GTCACCGTCT CCTCATAATG A 21 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE. TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: BstEII Hindlll linker non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

AGCTTCATTA TGAGGAGACG 20 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vii) IMMEDIATE SOURCE: (B) CLONE: primer cho03pcr 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CGGATCCAAG CTTGAGCCTG GATGTCGGAC GAGATGAT 38 
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CLAIMS 

1. A method for immobilizing a binding protein capable of binding to a spe- 
cific compound, comprising the use of recombinant DNA techniques for producing 
said binding protein or a functional part thereof still having said specific binding 
capabiiity,'saki protein or said part thereof being linked to the outside of a host cell, 
whereby said binding protein or said part thereof is localized in the cell wall or at 
the exterior of the cell wall by allowing the host cell to produce and secrete a 
chimeric protein in which said binding protein or said functional part thereof is 
bound with its C-terminus to the N-terminus of an anchoring part of an anchoring 
protein capable of anchoring in the cell wall of the host cell, which anchoring part is 
derivable from the C-terminal part of said anchoring protein. 

2. The method of claim 1, in which the host is selected from the group 
consisting of Gram-positive bacteria and fungi. 

3. The method of claim 2, in which the host is a Gram-positive bacterium 
selected from the group consisting of lactic acid bacteria, and bacteria belonging to 
the genera Bacillus and Streptomyces. 

4. The method of claim 2, in which the host is a fungus selected from the 
group consisting of yeasts belonging to the genera Candida, Debaryomyces, Han- 
senula, Kluyveromyccs, Pichia and Sacclwromyces, and moulds belonging to the 
genera Aspergillus, Pcnicillium and Rhizopus. 

5. A recombinant polynucleotide comprising 

(i) a structural gene encoding a binding protein or a functional part thereof 
still having the specific binding capability, and 

(ii) at least part of a gene encoding an anchoring protein capable of anchoring 
in the cell wall of a Gram-positive bacterium or a fungus, said part of a 
gene encoding at least the anchoring part of said anchoring protein, which 
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anchoring part is derivable from the C-terminal part of said anchoring 
protein. 



6. The polynucleotide of claim 5, wherein the anchoring protein is selected 
from the group consisting of ^-agglutinin, a-agglut|nin, FLOl, the Major Cell Wall 
Protein of a fungus, and proteinase of lactic acid bacteria. 

7. The polynucleotide of claim 5, further comprising a nucleotide sequence 
encoding a signal peptide ensuring secretion of the expression product of the 
polynucleotide. 

8. The polynucleotide of claim 7, wherein the signal peptide is derived from a 
protein selected from the group consisting of the a-mating factor of yeast, a-agglu- 
tinin of yeast, invertase of Saccharomyces, inulinase of Kluyveromyces, a-amylase of 
Bacillus, and proteinase of lactic acid bacteria. 

9. The polynucleotide of any of claims 5-8, operably linked to a promoter, 
which can be an inducible promoter. 

10. A recombinant vector comprising a polynucleotide as claimed in any of 
claims 5-9. 

11. A chimeric protein encoded by a polynucleotide as claimed in any of 
claims 5-9. 

12. A host cell having a cell wall at the outside of its cell and containing at 
least one polynucleotide as claimed in any of claims 5-9. 

13. The host cell of claim 12, having at least one polynucleotide as claimed in 
any of claims 5-9 integrated in its chromosome. 
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14. A host cell having a chimeric protein as claimed in claim 11 immobilized 
in its cell wall and having the binding protein part of the chimeric protein localized 
in the cell wall or at the exterior of the cell wall. 

15. The host cell of any of claims 12-14, which is a fungus selected from the 
group consisting of yeasts and moulds. 

16. A process for carrying out an isolation process by using an immobilized 
binding protein or functional part thereof still capable of binding to a specific 
compound, wherein a medium containing said specific compound is contacted with a 
host cell as claimed in any of claims 12-15 under conditions whereby a complex 
between said specific compound and said immobilized binding protein is formed, 
separating said complex from the medium originally containing said specific 
compound and, optionally, releasing said specific compound from said binding 
protein or functional part thereof. 
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